AITopics | Central Region

Collaborating Authors

Central Region

Using Machine Learning to Detect Fraudulent SMSs in Chichewa

arXiv.org Artificial IntelligenceFeb-24-2025

SMS enabled fraud is of great concern globally. Building classifiers based on machine learning for SMS fraud requires the use of suitable datasets for model training and validation. Most research has centred on the use of datasets of SMSs in English. This paper introduces a first dataset for SMS fraud detection in Chichewa, a major language in Africa, and reports on experiments with machine learning algorithms for classifying SMSs in Chichewa as fraud or non-fraud. We answer the broader research question of how feasible it is to develop machine learning classification models for Chichewa SMSs. To do that, we created three datasets. A small dataset of SMS in Chichewa was collected through primary research from a segment of the young population. We applied a label-preserving text transformations to increase its size. The enlarged dataset was translated into English using two approaches: human translation and machine translation. The Chichewa and the translated datasets were subjected to machine classification using random forest and logistic regression. Our findings indicate that both models achieved a promising accuracy of over 96% on the Chichewa dataset. There was a drop in performance when moving from the Chichewa to the translated dataset. This highlights the importance of data preprocessing, especially in multilingual or cross-lingual NLP tasks, and shows the challenges of relying on machine-translated text for training machine learning models. Our results underscore the importance of developing language specific models for SMS fraud detection to optimise accuracy and performance. Since most machine learning models require data preprocessing, it is essential to investigate the impact of the reliance on English-specific tools for data preprocessing.

dataset, detect fraudulent smss, smss, (12 more...)

arXiv.org Artificial Intelligence

2502.16947

Country:

Africa > South Africa > Gauteng > Johannesburg (0.04)
Africa > Kenya (0.04)
Asia > Pakistan (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Law Enforcement & Public Safety > Fraud (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

HumT DumT: Measuring and controlling human-like language in LLMs

Cheng, Myra, Yu, Sunny, Jurafsky, Dan

arXiv.org Artificial IntelligenceFeb-18-2025

Should LLMs generate language that makes them seem human? Human-like language might improve user experience, but might also lead to overreliance and stereotyping. Assessing these potential impacts requires a systematic way to measure human-like tone in LLM outputs. We introduce HumT and SocioT, metrics for human-like tone and other dimensions of social perceptions in text data based on relative probabilities from an LLM. By measuring HumT across preference and usage datasets, we find that users prefer less human-like outputs from LLMs. HumT also offers insights into the impacts of anthropomorphism: human-like LLM outputs are highly correlated with warmth, social closeness, femininity, and low status, which are closely linked to the aforementioned harms. We introduce DumT, a method using HumT to systematically control and reduce the degree of human-like tone while preserving model performance. DumT offers a practical approach for mitigating risks associated with anthropomorphic language generation.

computational linguistic, dataset, human-like tone, (13 more...)

arXiv.org Artificial Intelligence

2502.13259

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Austria > Vienna (0.04)
North America > Canada > Ontario > Toronto (0.04)
(14 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How AI monitoring is cutting stillbirths and neonatal deaths in a clinic in Malawi

The GuardianDec-6-2024, 07:00:48 GMT

When Ellen Kaphamtengo felt a sharp pain in her lower abdomen, she thought she might be in labour. It was the ninth month of her first pregnancy and she wasn't taking any chances. With the help of her mother, the 18-year-old climbed on to a motorcycle taxi and rushed to a hospital in Malawi's capital, Lilongwe, a 20-minute ride away. At the Area 25 health centre, they told her it was a false alarm and took her to the maternity ward. But things escalated quickly when a routine ultrasound revealed that her baby was much smaller than expected for her pregnancy stage, which can cause asphyxia – a condition that limits blood flow and oxygen to the baby.

hospital, kaphamtengo, malawi, (11 more...)

The Guardian

Country:

Africa > Malawi > Central Region > Lilongwe District > Lilongwe (0.25)
North America > United States > Texas (0.07)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

Yao, Zijun, Qi, Weijian, Pan, Liangming, Cao, Shulin, Hu, Linmei, Liu, Weichuan, Hou, Lei, Li, Juanzi

arXiv.org Artificial IntelligenceJun-27-2024

This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM's self-aware uncertainty to preserve the snippet that reduces their uncertainty to the utmost. To facilitate solving complex tasks that require multiple retrievals, SeaKR utilizes their self-aware uncertainty to choose among different reasoning strategies. Our experiments on both complex and simple Question Answering datasets show that SeaKR outperforms existing adaptive RAG methods. We release our code at https://github.com/THU-KEG/SeaKR.

knowledge, llm, retrieval, (14 more...)

arXiv.org Artificial Intelligence

2406.19215

Country:

Africa > Tanzania > Dar es Salaam Region > Dar es Salaam (0.05)
Africa > Kenya > Nairobi Province (0.04)
Africa > Kenya > Nairobi City County > Nairobi (0.04)
(37 more...)

Genre: Research Report (0.64)

Industry:

Media > Film (1.00)
Media > Television (0.68)
Media > Music (0.68)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The AI revolution comes for farmers growing a third of our food

The Japan TimesJun-15-2024, 06:10:00 GMT

In the village of Ndodo, 40 kilometers south of the Malawian capital Lilongwe, farmers gather in the shade of an acacia tree as a voice over a smartphone tells them how to get rid of a weevil that's destroying their sweet potato crops. The tips offered by the app in the local language Chichewa is one of the first examples of how artificial intelligence is being used to aid subsistence farmers in some of the poorest parts of the world. Piloted by a Chicago-based nonprofit organization Opportunity International, the app called Ulangizi -- which translates as "Advice" -- works on WhatsApp and uses data from ChatGPT and the Malawian government's English-language agricultural manual to answer questions or diagnose crop and farm animal diseases. "The majority of our people do not know how to read or write," said Anna Chimalizeni, a 36-year-old mother of three, who as a government farmer-support agent demonstrates the app to farmers. "I am there to help them write issues they have at their farms and read the response on their behalf. They also have a chance to listen to the response through voice notes which come in our own local language."

app, farmer

The Japan Times

Country:

North America > United States > Illinois > Cook County > Chicago (0.29)
Africa > Malawi > Central Region > Lilongwe District > Lilongwe (0.29)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for Drought

Akanbi, A

arXiv.org Artificial IntelligenceMay-17-2024

Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.

accurate knowledge representation, environmental monitoring domain, lightweight ontology representation, (17 more...)

arXiv.org Artificial Intelligence

2405.10713

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.13)
Africa > Sub-Saharan Africa (0.04)
(41 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Personal (0.92)

Industry:

Health & Medicine (1.00)
Government (1.00)
Food & Agriculture > Agriculture (1.00)
(3 more...)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
(4 more...)

Add feedback

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Naggita, Keziah, LaChance, Julienne, Xiang, Alice

arXiv.org Artificial IntelligenceAug-16-2023

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.

artificial intelligence, geotagged image, social media, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3600211.3604659

2308.08656

Country:

Asia > Brunei (0.14)
North America > Canada > Quebec > Montreal (0.06)
Africa > Sierra Leone (0.06)
(142 more...)

Genre: Research Report > Experimental Study (0.66)

Industry:

Health & Medicine (0.92)
Information Technology > Services (0.75)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Exploring the Benefits of Training Expert Language Models over Instruction Tuning

Jang, Joel, Kim, Seungone, Ye, Seonghyeon, Kim, Doyoung, Logeswaran, Lajanugen, Lee, Moontae, Lee, Kyungjae, Seo, Minjoon

arXiv.org Artificial IntelligenceFeb-8-2023

Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training tasks is the key component in making stronger MT LMs. In this work, we report an unexpected finding that an expert LM fine-tuned on just a single task can outperform an MT LM trained with 300+ different tasks on 11 different unseen datasets and on 13 datasets of the BIG-bench benchmark by a mean accuracy of 3.20% and 1.29%, respectively. This finding casts doubt on the previously held belief that simply scaling the number of tasks makes stronger MT LMs. Leveraging this finding, we further show that this distributed approach of training a separate expert LM per training task instead of a single MT LM for zero-shot inference possesses many benefits including (1) avoiding negative task transfer that often occurs during instruction tuning, (2) being able to continually learn new tasks without having to re-train on previous tasks to avoid catastrophic forgetting, and (3) showing compositional capabilities when merging individual experts together. The code is available at https://github.com/joeljang/ELM.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.03202

Country:

Europe > France (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
(24 more...)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.92)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Annual field-scale maps of tall and short crops at the global scale using GEDI and Sentinel-2

Di Tommaso, Stefania, Wang, Sherrie, Vajipey, Vivek, Gorelick, Noel, Strey, Rob, Lobell, David B.

arXiv.org Artificial IntelligenceDec-19-2022

Crop type maps are critical for tracking agricultural land use and estimating crop production. Remote sensing has proven an efficient and reliable tool for creating these maps in regions with abundant ground labels for model training, yet these labels remain difficult to obtain in many regions and years. NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, originally designed for forest monitoring, has shown promise for distinguishing tall and short crops. In the current study, we leverage GEDI to develop wall-to-wall maps of short vs tall crops on a global scale at 10 m resolution for 2019-2021. Specifically, we show that (1) GEDI returns can reliably be classified into tall and short crops after removing shots with extreme view angles or topographic slope, (2) the frequency of tall crops over time can be used to identify months when tall crops are at their peak height, and (3) GEDI shots in these months can then be used to train random forest models that use Sentinel-2 time series to accurately predict short vs. tall crops. Independent reference data from around the world are then used to evaluate these GEDI-S2 maps. We find that GEDI-S2 performed nearly as well as models trained on thousands of local reference training points, with accuracies of at least 87% and often above 90% throughout the Americas, Europe, and East Asia. Systematic underestimation of tall crop area was observed in regions where crops frequently exhibit low biomass, namely Africa and South Asia, and further work is needed in these systems. Although the GEDI-S2 approach only differentiates tall from short crops, in many landscapes this distinction goes a long way toward mapping the main individual crop types. The combination of GEDI and Sentinel-2 thus presents a very promising path towards global crop mapping with minimal reliance on ground data.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2212.09681

Country:

Asia > East Asia (0.24)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Switzerland > Geneva > Geneva (0.14)
(34 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.66)

Add feedback

OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping

Xia, Junshi, Yokoya, Naoto, Adriano, Bruno, Broni-Bediako, Clifford

arXiv.org Artificial IntelligenceOct-19-2022

We introduce OpenEarthMap, a benchmark dataset, for global high-resolution land cover mapping. OpenEarthMap consists of 2.2 million segments of 5000 aerial and satellite images covering 97 regions from 44 countries across 6 continents, with manually annotated 8-class land cover labels at a 0.25--0.5m ground sampling distance. Semantic segmentation models trained on the OpenEarthMap generalize worldwide and can be used as off-the-shelf models in a variety of applications. We evaluate the performance of state-of-the-art methods for unsupervised domain adaptation and present challenging problem settings suitable for further technical development. We also investigate lightweight models using automated neural architecture search for limited computational resources and fast mapping. The dataset is available at https://open-earth-map.org.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.10732

Country:

North America > United States > Maryland (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Austria > Vienna (0.14)
(74 more...)

Genre: Research Report (0.84)

Industry:

Food & Agriculture > Agriculture (0.47)
Government > Regional Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback